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Introduction 

For the past thretf years, the Florida Statewide Assessment Progran 
has been gathering data, on basic cognitive skills of mathematica and 
comriaiication domain from a saitplm of students in each of the sixty-seven 
(67) districts in Florida, Onm of the priinary uses of tJie assessment data 
is to determine how students in a given district are progressijig towards the 
masteiy of certain objectives. The basic question, 'How well is district 
X doing?' is answered by con^aring the district percentage with the state 
average. The district perfomance indicator obtained in this v/ay did not' 
control for differing conimmlty and student backgrouiid inputs across districts 
and might mistalcenly or imjustly give tiie district blame or credit. 

The conflnunity Mid student background inputs are measured by any nwier 
of socio-economic or soclo=cultural variables such as family income ^ parents' 
educational levels and parents' occupations. Those represent 'hard-to-change' 
variables and in general are related to achievement to a greater degree than 
are mmipuable variables such as class size, teacher experience , etc* Thus, 
any atten^t to exafnine the effectiveness of a district's educational progrm must 
control for the non-school variables in order that meaningful interpretatim 
can be made, Cki the basis of these findings , the Florida Statewide Assessment 
Program has begim analyzing the assessment data while taking into accomt 
of differences in non-school variables across districts* 

Statement of the Problem 

In the age of the electronic computer, many problems are being solved using 
multiple correlation and regression techniques which would never have been 
attempted had electronic con^uters not been available* However, with the 
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CCTi^uter doing the calculations, problems can be solved without the 
manipulation being fully imderstocrf by the person en^loylng the technique, 
iherefore, there is a need for a discussion o£ certain concepts of imiltiple 
correlation and regression techniques to prevent the user of these 
techniques from reaching erroneous conclusions, Hiis article attempts to 
fill this need* 

In the course of analysis ^ t>^ical of the kinds of problems that mtm 
encomtered by the author was whether to create conplex variables which 
would account for interactions between single input variables or to use more 
easily explained variables. Hie puipose of this study was two-fold. Firstly, 
it was to determine whether the inclusion of quadratic and/or interaction 
terms in a regression model would iitprove the prediction of district score 
as represented by the multiple correlation* The second puipose of this 
investigation v;as to illustrate the step-wise regression procedure while 
attempting to determine the quadratic and/or interaction effects In the 
regression model* 

Data for Analysis 

Ihe total mean score in grade 6 for mathemtics was selected as the 
criterion (output) variable* There are sixty-seven (67) observed scores, 
one for each school district. Hie score for each district was calculated 
from the 1972*73 Statewide Assessment results and Is sho^vTi together with its 
standard deviation in Table 1. 

To obtain a pool of potential prediction (input) variables , the lists of 
variables contained in the Accreditation files the Bureau of Finance files 
and U •Si Census data were scaimed for those variables which might relate to 
achievement in the Statewide Assessment Program, Tlie final selection of 13 
predictor variables that were analyzed are listed in Table 2, bM their means 



md standard deviations are given in Table 3. 

Procedure for Analysis 

In order to cojipare the usefulness of first -order and secOTid^order 
regression aquations, fwr models^ I, II, III and were developed as shoivn 
in Table 4^ Each model was devel^ed using a step-mse inultiple regression 
program CMOZR) on an im 370 Conputer. ITie coi^utational details of the 
method is illustrated belo\^ usmg the results for the linear niodel I. 

The first step is to select one of the thirteen (13) predictor variables. 
One way to choose the first variable would be to perform thirteen (13) separate 
simple regressions and confute the F-ratio using 

F = S«i SuTi of Squares due to regression 
1 Him of squares due to residual 

m ^'^Z Regr SS^ 

r kesd SSi ^ ■ 

Where Rj^^ denotes the inultiple correlation coefficient between the 
criterion variable X^^ and the predictor variable i ^ 2, 3^ 13, 14* 

TTie siffn of squares (SS) in f^quation (1) has subscripts i to indicate is 
the predictor variable. TTia F-value obtained fran equation (1) can be used 
to test the null h)^othesls Hp • m q. The selection of one of thirteen 

(13) variables depends tpon the magnitude of its F-value; a variable with 
the highest F- value would be used as the predictor variable. Table 5 
indicates that X2 is the variable with the highest F-value, . 

In order for a variable to be included in the analysis, the F-value for 
the variable must exceed seme predetermined value. The preassigned F-value 
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can be set quite lo\^; scmietijnes it is set as low as F ^ 0,001 so that one is 
almost certain to get a variable included. In this analysis, significance of % 
was tested at an alpha ) level of 0,05. In order to be significant with 
^ - O.OS, the F' value has to be higher than 4.00. 

The next step in the malysis consists of choosing the second predictor 
variable to be included in the regression malysis. One way to do that is to 
coi^ute the partial correlation coefficients rj^^j* ^"^3,4, ... 13/14 using the 
formula 

• . . 

The partial coefficients ^ r^i 2* ^®&sure the relationship between the 
criterion variable ^d each of the remaining predictor variables , X4 ... 
^14 f while controlling for the variable It was necessary to control for 
X2 in otder to take out its effect since the variable X2 has already been 
included in the equation in the first step of the analysis. The second 
predictor variable to be included would be that variable which explains most 
of the remaining variation in the criterion variable Xj^* 'ITiis variable is the 
one with the highest partial correlation. 

M equivalent way of choosing the second variable to be included in the 
analysis is to confute the multiple correlation coefficient RL2i Ci^3,4,...14) 
for each possible two variable regression models containing the variable X2 
and one additional variable X^* Ihe coefficient Rj,2i is computed using the 
formula 

variation variation 
e^lained = explained * 
■ . br X2 and by X2 

■ ^ ;.r .. 6 



additional variation 
variation x miexplalned 
e^lained by X2 



by X 
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The variable with the highest multiple correlation is the one with the 
highest partial correlation. In addition, the variable with the highest 
multiple correlation is the wie with the highest F-value. The F-ratio is 
coTi^uted using the foimila 

F ^ml dai ^ 4a 

^ Regr SS^j^ - Regr 

^ Resd SS2i ^ 

which is distributed as F with 1 md n-3 degrees of freedom. The correlations 

^fi.*2 Rx.2i given^ together with F-values^ in Table 6, It can be 

seen from the table that both correlations Cpartial md multiple) and the F=value 

for Xg are the highest. TTius, Xg is the second variable included in the walysis 

since its F-value (21.63) exceeds the predetermined value, 4,00* 

Itaving included the variable Xg in the malysis, the next step in the procedure 

is to examine whether the variable Xg, included in the first step, is needed for 

the regression equation any longer. This is done by first regressing the 

criterion variable on Xg, resulting in Rf ^6// then examining whether 

adding the variable X2 produces a significantly larger coefficient Ri^26' The . 

ijicrease in prediction is measured by the F-ratio 

2 2 
P . Rl,26 ^ 

^ ^ ^1.26 

^ n-3 Regr SS26 - Regt SSg ; / 

Resd SS26 

^ 39.22 

Since the F -value, 39.26^ is greater than the predeteraiined value of F-4.00, the 
variable X2 still contributes enough to be included in the OTalysis, 
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Ifavmg included X2 ^nd X^^ the step-wise procedire ne^ computes r|i^26 
(1=3,4,5,7,85.., 13, 14). TTiese coefficients measure the relationship between 
the criterion variable and each of the eleven remaining variables while 
controlling for the variables Xg and Xg which are already included in the 
malysis. The partial coefficients are listed in Table 7, It cm be seen from 
the Table 7 that Xg has the highest partial coefficient, =*2571, Since Xg 
has the F'value, 4. 459, greater tiian the pre-set value of 4,00, it is the 
third variable to be included in the malysis. tteving included the 
procedure next examines whether X2 md Xg are needed my longer in the 
analysis. X2 will be excluded from the analysis if the F-ratlo 

F - ^ R1. Z68 ' ^1.68 

1 " ^ Y "~ 

is smaller than the pre-set value, 4.0* Similarly, X^ will be excluded if 
the F-ratlo ' 

f '^ A.m - "in : . 

^ ^ ^.268 

is less than 4.0. In Table 7, the F-values for Xg and Xg are equal to 45,71 
and 27.25 respectively. Sinqe both of these values are greater thw the pre- 
set value, 4,0^ X2 and are retained in the Emalysis after the inclusion of 

Xg. ... 

TTiis procedure of inclusiOT of the ne.Kt variable md exclusion of possible 
variables already included continues imtil no new variable contributes enough to 
the multiple correlation to be included in the regression model. Of thirteen 
. (13) predictor variables , mly three variables , X2 , Xg . and Xg , con tribute enough 
^ to the nwltiple correlation to b# included in tihe model T. TTie three va^ 
from Model I were forced to remain in the. prediction equations m the Models 
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Ilj III and IV. TTiis was necessary in order to make the statistical con^arison 
o£ the linear model and other models desigied to measure curvilinear 
relatiohships , 

Model II was "developed by including the squared terns of each of the predictor 
variables plus the^forced linear terms Xs and Xg Model I, Th^ new 
variables included in Model II are the variable X4 md the square o£ X2 denoted 
by X2»X2. TTie Model III was investigated by Including all possible interaction terms 

and the variables X2, )^ and'Xg from Model I, M Interaction variable is the 
product o£ t^>ro predictor variables, denoted by X^.%p where i^ j ^ 2,3, 13, 14, 
l^e thirteen (13) predictor variables give rise to 76 possible interaction tems* 
Since the nimiber o£ Interaction variables X^Xj exceeds the nimtoer of cases 
Cn-67) , interaction variables were systemtically analyzed in groups o£ 25 
variables along with the variables X2, Xg and Xg, This was necessary in order 
to avoid overfitting the regressiOT equation. Model III Included the jjiteraction 
terms X2X5 and X5X10 the three linear terms X2, Xg, and Xg* The fourth 
model included the si^ificant linearp quadratic, and uiteraction terms included 
In the previous models. Namely, the variables X2, X4, Xg,'Xg^ X2X2> X2XS 
X5XJQ were included in Model IV. 

Comparison of four Models 

Since the purpose of this study was to investigate whether the inclusion of 
square and/or interaction terms in a regression model would be an ii^roved model 
in tenns of predictability, the improvement was detemined by conparing the 
result from the Model I against the results from the Models II, III md IV. 
lliere are several criteria which cm be applied to make this con^arlson* One 

of the TOSt coninon criteria is to examine the iquare of multiple correlation 

.. ^ . , . . ' . . . 

coefficient, R^, deftned by ^ - 

/p^ m Sum of squares due to regression 
: Total sum of squares ' 
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It is often stated as a percentage, 100 R^. The larger it is, the better 
the fitted equation e^cplains the variatiOT in the data. The value of 
rZ resulting from each of the four models is con^iared in Table 8. Thus, 
we see a substantial increase in R^ in the second-order itiodel, 

A secmd way of deteminijig the predictability of the four models is 
to con5>are the standard error of estimate S, in relation to the mean of 
the 67 observed scores, ^e value of S as a percentage of Xi ^ 58*4656 
for each o£ the four models is shoim in Table 8, Examination of this 
statistic indicates that the inclusion of curvilinear effects in the 
linear model has reduced the standard error of estimate from S.8 to ' 
about S*3 percent of the mean obseivaticns. 
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Table 1 

1972-73 Means and Standard Deviation for Grade 6 Mathematics 



District No. . . . ■ ,: ' Mean Score Standard Deviation 

1 56. 3 1.02 

2 58.8 1.6S 

3 62.6 0.84 

4 53.3 1.22 ■ - 

5 69.6 0.67 

6 , 61.5 0.-55- 

7 66.1 1.26 

8 63.6 1.33 

9 60.6 ; . 1.07 

10 65. 9 0.98 

11 S6;9 0.99 
12 " 53.7 " 1.S6 

13 64.4 0.28 . 

14 57.6 1.47 

15 . ' 46.0 1.97 

16 S7.2 0.62 

17 61.9 0.66 

18 50.0 1.34 
.. 19 . / 59.0 1.42 

20 49.1 1.04 

21. 67.9 1.86 ! 

■ -23 ■ -^ 59.2 ■■ .■ 1.49 
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Table 1 Cont'd 




District No, 


Mean Score 


Standard Di 


24 


52.1 


1.81 


25 


SO.O 


1.23 


26 


56.9 


1.54 


27 • 


62.0 


1.06 


28 


51.0 


1.29 


29 


60.1 


0.59 


JO 


61.6 


1.13 


31 


S8.3 


0.84 


32 


59.1 


0.98 


33 


47.0 


1.69 


34 


55,6 


1.93 


35 


60.3 


1.02 


36 


57.9 


0.90 


37 


: ; : ; _s8.i ■ : 


1.Q6 


38 


56.0 


1.18 


39 


S6.9 


1.87 


40- 


50.0 


1.07 - 


41 


61.6 


0.89 




'55.7 • 


0.88 


/-■■43 - ■ ■ ■ 


54.9 


1.21 


44 


• 62.6 


1.05 


'45. 


. '59.7 . 

•■ . ■ ■ . • ■ : - . ^ 


1.28 


.;46-; ■ " - 


■ ■ . . - , \ 

66.0 


0.88 


, ;;47,,/.,: '. 


. ■.. . .. . 55.1 : . . 


.1.39 




" 63.6 


0.69 




■. 62.0 
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Table 1 Cont'd 

• - District No . Mean Score Standard Deviation 

50 . -. r . 58.8 0.45 

51 . ." - 62.8 1.04 

52 ' 63.2 0.66 
11' ■ 62.2 0.67 

54 55.6 1.05 

55 56.9 0.92 

56 56. 1 . 1.22 

57 65.6 0.87 - 

58 * N 63.9 0.69 

59 64. l"- 0.98 

60 59.3 1.43 
. 61 55.6 1.12 

62 57.4 1.25 

63 55.0 1.72 
. 64 64.2 0.85 

65 - 52.9 1.64 

66 66.0 1.28 

67 54.9 1.47 " 
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Treble 2: ^Description of Prediction Variables 



Minor icy Enrollment , Percent of pupil enrollment that is non-white ^ Spanish 
speaking. Oriental or American_Indian* 

Sourcer quantitative Report, ACC-l, Acer edita DOE, ^ 

Variable Number I Kg or simply 2* Variable Symboli MNRE 

Average Daily Attendance , The number of pupils in average daily member ship i 
grades K T 12 for the year 1972-73, 

Source:^ Quantitative Report, ACC-1, Accreditation Section, DOE, 

Variable iNumber I or simply 3, Variable Symboli ADM -' - 

Poverty Level t Approximate percent of the student body from families with an 
averageannual income of less than $3,000. 

Source! : Quantitative Reports ACC-l, Accreditation Section, DOE, 
Variable ^Number: or simply 4. Variable B^boli FMI 

White Collar Occupation , Approximate percent of the student body from families, 
with 'white collar' occupations Include professional, technical, clerical and 
kindred worker. A more detailed example can be found In the source. 

Source: iQuantltative Report, ACC-1, Accreditation Section, DOE. 



Variable Number: X^ or simply 5. Variable Symbolr OCP - 

Average Family Income , The combined Income of all families divided by the 
number of families in the district. 

Source: United States Census of Population, 1970 i General Social and Economic 
Characteristics, Florida Summary, Series PC (1) -Gil, Bureau of Census 
United States Department of Commerce, April 1972, 

Variable Number: Xg or simply 6V Varlabre Symboli AVGI 



Per Capita Income . This is the meta income computed for every man, Womanj and 
^hild in a partic It Is derived by dividing the total income of a 

particular group by the total population in that group. 

SQurcei U.S. Census of Population, 197 Cll^ Bureau of Census, 

U,8, Department of Coimserce. 

Variable Number: or simply 7* Variable Symbolr INCP _ ._ . — / . : 



Housing , Percent increase in housing unitSi 1960-70, 

Source I /Florida Statiatical Abstract, 1971 » Bureau of Economic and Bueines a 
'Research, University of = Florida, ■ 

Variable Number i Xg or siinply 8. Variable S^boli HSNGV 

School Education .' This Is the median school years completed for 
the population 25 years of age and older of the dletrict* 

Source: -U.S. Census of Populationp 1970 Series PC(1) - CU, Bureau of Census 
U.S. Department of Coimnerce. 

Variable Nuraberi Kg or simply 9. Variable Symboli SCHED 

GollBge Education , Percent of 1970 male population, with 1 to 3 years of 
college completed. 

Soufcei U. S. Department of Comerce, Bureau of Census^ PG(1) - Cll. 
Variable Number: X^^ or simply 10. Variable Symbol i GOLED 

Post College Ed . Percent of 1970 male population, with 4 or more years of 
college completed. 

Source: U. S, Department of Comaerce, Bureau of the Census, PC (1) -- Cll. 
Variable Number: X^^ or simply 11, Variable Symbol: CMD 

Percent of Population Classified as Urban . The pereent of the district * b 
total resident population living in urban places and urban areas according to 
the 1970 census. 

Source: U, S, Department of Connnerce, Bureau of the Census, PG(1) - Cll. 
Variable Number: Xj^ 2 o^ simply 12. Variable S^bol: URBN 

SiKty-five Years and over . The percent of 1970 population with 65 years and 
over. 

Sourcei U.S. Department of Commerce, Bureau of Census, PC(1) - Bll. 
Variable Nraiberr simply 13, Variable Symbol: SXTY 

Free Lunch, Approximate percent of the student body receiving free or reduced 
lunch. - - . 

Source: Food and Nutrition Service, Floridif Department of Education. 
, Variable Number: X^^ or simply 14. Variable Symbol j' LNCH 



Mems mid StOTdard Deviations 

Variable Najne ' 
Minority ^rollment 
Avardge Daily Attendance 
Poverty Index 
Miite Collar Occupatim 
Average Family Income 
Capita Income 
Hbusing 

School Education 
College Education 
Post College Education 
Urban 

Sixty-Five Years 
Free Lunch 



Table j 

of the Predictor Variables in Table 2 



Mean Standard Deviation 

24,57 S.35 : . ■ 

24,01 ^14,96 

23,40 ' ; 42,90 

27,90 ; 13,80 

6,23 ^ 13.13 ; 

2,47 1,63 v 

39.50 0,60 

10.80 3S.80 

8.60 1.30 

9.40 3. SO 

42.80 5.50 

13.60 30,70 

38,70 \ 6,80 
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Four Regresslori Models 



Model I Cfiyst-order model) 



Model 1( First Order): 

X =s t s X t e X t ————— + 8 X + c 

1 1 2 2 3 3 lit m 1 



MODEL 2 (Quadratic) 



2 



X , 6, t 8 X t 8 X t — — — t S X t- E e.. X t e 



m 1!* 



MODEL 3 (Interaction): : 

X, 6, t 6 X t S X t — -t 8 X t X E 8..Y.x. + e 

I 1. 2 2 : 3 3 lif 11+ ^-J- ^^J^^^^;^^3^^ 

- - - ' : i ^ 3 

MODEL 4 (Second-Order): , "Hi* iit ^ 

X _ 0 t B X text — — — — — t 1 x . t ): • j: 8. . x^ x - + c 

1 1 2 2 3 3 Lii ^ 

Where: Xj^s the criterion variable 



^ — and i*; are unknown regression coefficients. 
The^e are estimated .by the quantities b^^ b^, — '^^^14 
by requiring the-error sum of squares to be minimized. 



X2. X3^ ^Xih ^[^g values of predictor vaniables, 

^ (ij 3 - Zj 3^ .... 14} is the product of the value corre- 
sponding to and the value corresponding to '^j. 



And i Is the residual for Model 1 
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Data for selection o£ variables in Step #1 









Va 


riabl« 


3 rojrib 


ers as 


list 


ed in 


Table 


2^-^™ ' " ' ' 


■ 2 


3 - 


4 


5 


6 


7 




9 


10 


11 


.2 


13 - 


14-;: 


1.1 . 


-.658 


.325 


- 6)7 


531 


56"! 


437 ' 


71% 


dsn 




.368 


.416 


.152 


-.650 




.433 


.106 


.381 


.282 


.317 


. 191 


.075 


.231 


.226 


.135 


,173 


.023 


.423 




.567 


.894 


.619 


.718 


.683 


.809 


.925 


.769 


.773 


.865 


,827 


.977 


.577 




.764 


.118 


.615 


.393 


.464 


.236 


.081 


.299 


.293 


.156 


209 


.024. 


.732 


; F; - 


49.6 


7.67 


39.9 


25. S 


30.2 


15.3 


5.23 


19.5 


19.1 


10. 1 


13 ..6 


1.53 


47. S 



Table 6 

Data for selection of variables in Step #2 









Ve 


iriable ntmbers as listed in Table 2 








I 


4 




6 


7 


8 


9 


10 


11 


12 


13 


il4 ^ 


rii.2 


.415 


-.397 


.382 


.503 


.346 


.008 


.414 


.499 


.404 


.435 


.038 


-.309 


J- 

Tli.2 


.173^ 


.157 


.146 


.253 


.119 


.001 


.172 


.249 


.163 


.189 


.002 


.095 


^1.21 


.531 


.522 


.516 


.576 


.501 


.433 


.531 


.574' 


.525 


.541 


.434 


.487 




13. 3 


11.9 


10.9 


21.6 


8.74 


0.01 


13,2 


21.3 


12.5 


IS.O 


D.09 


6.78 



V - Table 7 

Data for selection of variables in step #3 



WULTI^I E R - D.7771 
aWLYMS OF VARIANCE ' 



J 
61 



SUM or SQuasFs 

7g9.561 



SQUARE 
370, AOS 



3t#0^4 



V^PlABLrS IN fnUATlON 
COimcrCNT STIJ, FRROR 

-□,0t97n 0,01*07 



F TO REMOVE 



45* FoaS ff J 
37,^504 Jin 
4.4iy4 111 



^DA ' 3 
FMLlNe 4 
OCU^TN . 5 
INCPTA 7 

COLGED 10 
eOLGPO I 1 
URBAN 
SIM¥»S 13 
FRLNCM 14 



VARIABLES NO? 
PARTIAL eOHR- 



-0,i?3?% 
*0, 00671 
-0*1 3Sla 
-0.00^61 

0,1 

0,05041 
0,07637 
0,00985 
-O.S9t6& 



JN CnUATlON 
TOLERANCE 



0.4flf!0 

e.42S«* 

0*3^13 
0. 16 79 

3^9 

0,374Ey 
□*37f3 

0.H6] & 

0,338i 



r TD E*iT£^ 



Q. ! 164 121 

0*9i69 C?) 

0*00i^ tB\ 

1*11.4 3 i^) 

0,0919 tti 

2*3947 (^} 

B*l5a* C^l 

0,363fl tfi 

QiOOSl C?i 

0.^363 tar 
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Table 8 

Data for Con^arlson o£ Four Regression Models 



MbDEL 


VARIABLbb IN 




KBSinJAE 






StANDARb RRRdk dF EST. 


^ F- 




. ... . # 


Tffi MODEL 




Smi of 
Squares 


DF 


Mean 
&iuare 


1 S 


58.46 


VALUE 




1 




60.4 


729. S6 


63 


11.58 


3.41 


5.80 


32.02 




II 
























67.1 


606.09 


61 


9.94 


3.15 


5.39 


24.88 




TTT 


Y Y V 






















65. 9 


627.63 


61 . 


10.29 


3.21 


5.49 


23.61 




IV 


%' ^4' ^6* ^8* 




















^5^10 


69.2 


557.99 


59 


9.63 


3.10 


5.30 


24.88 

























